

Hybrid Memory Products Ltd

# **SRAM Module**

# Mean Time Between Failure Analysis (MTBF)

## Introduction

In general terms, the reliability of any plastic module assembly can be assessed by dividing the assembly into 4 critical areas - active devices, passive devices, substrates and interconnects.

Each is reviewed separately below:

# Active Device Reliability

Basically this is determined by the inherent reliability of the active components used and by the way this inherent reliability may be degraded during assembly and use.

# ♦ Inherent Component Reliability

Inherent active device reliability is determined by the original device manufacturer. Manufacturers carry out reliability tests (reliability monitors) on plastic product as an ongoing process monitor which allows component FIT (failure-in-time) rates to be calculated. As the part matures and more data is accumulated, the confidence level in the calculated FIT rate increases.

# ♦ Factors Effecting Inherent Component Reliability

In order to avoid degrading the inherent reliability of plastic components the following factors need to be considered:

# 1. Moisture absorption

Plastic Encapsulated Microcircuits can absorb moisture if not stored, handled or used properly. Moisture absorbed during storage or manufacture can lead to the so called "popcorn effect" (i.e., fracturing of the plastic encapsulant due to the rapid expansion of entrapped moisture) during the soldering process. Any ionic contaminants remaining after the manufacturing process or deposited during field usage may cause corrosion of internal metal surfaces. In addition, long-term moisture intrusion can mobilize residual ionic materials, initiating or accelerating this corrosion.

However results produced by Plastic Microcircuit suppliers suggest decades of use before moisture penetrates to the die (provided the encapsulant is intact).

Improvement in resin performance has tended to offset the degradation in moisture resistance as the resin coatings decrease in thickness as packages get smaller.

# 2. Mechanical or thermal overstressing during processing

Plastic packages can be damaged by exposure to high temperatures. Therefore the processing temperatures used in module assembly should be as low, and of as short a duration, as possible (and certainly within the component manufacturers guidelines).

# 3. Power dissipation

The reliability of any semiconductor device is heavily dependent on the junction temperatures reached during operation - the higher the temperature the lower the reliability. Therefore self heating effects and surrounding ambient temperatures should be controlled to keep the junction temperature below 150°C.

#### 4. Soft error rate

In plastic devices, alpha particles can be generated by impurities in the resin. Levels tend to be low and buffer coatings between the die and resin absorb most of the particles. In modules it would be expected that the soft error rate should be no worse than that of the plastic packaged devices used.

If it is assumed that module processing does not degrade the component inherent reliability, manufacturers published test data can be used to calculate component FIT or MTBF. The method used for calculating these figures is described in *Appendix 1*.

# Passive Device Reliability

The only passive devices used on plastic modules are ceramic chip capacitors used for de-coupling purposes. These devices are procured against MIL specifications and assembled to substrates using standard surface mount processes.

The contribution that passive components will make to module MTBF figures is several orders of magnitude less than the active components and they can usually be ignored.

#### Substrate Reliability

Substrates generally used for modules are Printed Circuits Boards (PCB's) Typically the boards are 6 layer multi-layer (4 signal, Pwr and Gnd) with FR4 base laminate. Provided these boards are operated within their specification widow, their potential for failure is minimal.

# Interconnect Reliability

Solder joints provide the majority of mechanical connection strength and thermal and electrical conduction paths within modules as they are used to connect components and leadframe to the substrate. Reliability of these joints is influenced by a variety of factors identified below:

# ♦ Joint configuration

The designed shape of the surfaces to be joined and the amount of solder deposited determine the joint configuration. Established design rules can be applied. In addition product design should ensure that joints are not subjected to undue mechanical stress, for example due to CTE mismatch.

Physical characteristics of the selected joining alloy (after processing) will inevitably determine joint reliability under mechanical stress.

## ♦ Processing

Controlled processing is required to ensure the alloy retains its expected physical characteristics after the joint is formed.

#### ♦ Substrate finish

Metallisation on the joint surfaces can have a significant impact on joint reliability.

#### ♦ Contamination

Contamination on joint surfaces before soldering will limit joint quality. Flux residues need to be fully removed to avoid corrosion and electrical leakage problems. Processes need to be set up to ensure all contamination is removed. The effectiveness can be checked by visual inspection, ionograph measurements

#### ♦ Thermal mismatch

If the materials in the assembly are not carefully matched for TCE (Temperature Coefficient of Expansion), slow or fast thermal cycling could cause significant damage to the solder connection.

#### Mechanical strength

The method used to attach the components to the substrate need to be robust enough to withstand mechanical stresses due to shock, acceleration, vibration etc.

If reliability figures for PCB and interconnection need to be included in module MTBF calculations, the methodology described in MIL-HDBK-217F Notice 2 can be used.

# **SUMMARY**

By correct application of design rules, careful component selection and stringent control of processes it can be seen that the reliability of a module is largely dependent on the inherent reliability of the active components used. Secondary factors are solder joint reliability and substrate (i.e. PCB) reliability.

An estimate of module reliability (MTBF) can be obtained by :-

- Calculating active device failure rate using manufacturer's test data.
- Calculating the Interconnection assemblies failure rate using MIL-HDBK-217F
- 3. Summing the individually calculated failure rates from above.

Note that the more that is known about the operating environment and conditions, the more accurate the calculated value.

# Appendix 1 - Calculation of Semiconductor Failure Rates

One of the fundamentals of understanding a product's reliability requires an understanding of the calculation of the failure rate. The traditional method of determining a product's failure rate is through the use of accelerated high temperature operating life tests performed on a sample of devices randomly selected from its parent population. The failure rate obtained on the life test sample is then extrapolated to end-use conditions by means of predetermined statistical models to give an estimate of the failure rate in the field application.

Although there are many other stress methods employed by semiconductor manufacturers to fully characterize a product's reliability, the data generated from operating life test sampling is the principal method used by the industry for estimating the failure rate of a semiconductor device in field service.

Table 1 gives definitions of some of the terms used to describe the failure rate of semiconductor devices.

| TERMS                          | DEFINITIONS/DESCRIPTIONS                                                                                                                                                                                                                                                                     |  |
|--------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Failure Rate (λ)               | Measure of failure per unit of time. The useful life failure rate is based on the exponential life distribution. The failure rate typically decreases slightly over early life, then stabilizes until wear-out which shows an increasing failure rate. This should occur beyond useful life. |  |
| Failure In Time (FIT)          | Measure of faiture rate in $10^9$ device hours; e. g. 1 FTT = 1 faiture in $10^9$ device hours.                                                                                                                                                                                              |  |
| Total Device Hours (TDH)       | The summation of the number of units in operation multiplied by the time of operation.                                                                                                                                                                                                       |  |
| Mean Time To Failure (MTTF)    | Mean of the life distribution for the population of devices under operation or expected lifetime of an individual, MTTF = $I/\lambda$ , which is the time where 63.2% of the population has failed. Example: For $\lambda = 10$ FTTs, MTTF = $I/\lambda = 100$ million hours.                |  |
| Confidence Level or Limit (CL) | Probability level at which population failure rate estimates are derived from sample life test. The upper confidence level interval is used.                                                                                                                                                 |  |
| Acceleration Factor (AF)       | A constant derived from experimental data which relates the times to failure at two different stresses. The AF allows extrapolation of failure rates from accelerated test conditions to use conditions.                                                                                     |  |

Table 1: FAILURE RATE PRIMER.

A simple failure rate calculation based on a single life test would follow equation 1.

$$\lambda \sim \frac{1}{TDH \times AF}$$
 (Eq. 1)

 $\lambda$  = failure rate.

TDH = Total Device Hours = Number of units x hours under stress.

AF = Acceleration factor, see Equation 3.

Since reliability data can be accumulated from a number of different life tests with several different failure mechanisms, a comprehensive failure rate is desired. The failure rate calculation can be complicated if there are more than one failure mechanisms in a life test, since the failure mechanisms are thermally activated at different rates. Equation 1 accounts for these conditions and includes a statistical factor to obtain the confidence level for the resulting failure rate.

$$\lambda = \sum_{i=1}^{\beta} \left( \frac{x_i}{\left( \sum_{j=1}^{k} TDH_j \times AF_{ij} \right)} \right) \times \frac{M \times 10^9}{\sum_{i=1}^{\beta} x_i}$$
 (Eq. 2)

where,

 $\lambda$  = failure rate in FITs (Number fails in  $10^9$  device hours)

 $\beta$  = Number of distinct possible failure mechanisms

k = Number of life tests being combined

x i = Number of failures for a given failure mechanism i = 1, 2,...  $\beta$ 

TDH j = Total device hours of test time for life test j, j = 1, 2,... k

AF ij = Acceleration factor for appropriate failure mechanism, i = 1, 2, ... k

$$M = X^2_{(\alpha, 2r+2)}/2$$

where,

X 2 = chi square factor for 2r + 2 degrees of freedom

 $r = total number of failures (\Sigma \times i)$ 

 $\alpha$  = risk associated with CL between 0 and 1.

In the failure rate calculation, acceleration factors (AF ij ) are used to derate the failure rate from the thermally accelerated life test conditions to a failure rate indicative of actual use temperature. The acceleration factor is determined from the Arrhenius equation. This equation is used to describe physio-chemical reaction rates and has been found to be an appropriate model for expressing the thermal acceleration of semiconductor device failure mechanisms.

$$AF = exp\left(\frac{E_a}{k}\left(\frac{1}{T_{use}} - \frac{1}{T_{stress}}\right)\right)$$
 (Eq. 3)

where,

AF = Acceleration Factor

E a = Thermal Activation Energy (Table 2)

k = Boltzmann's Constant (8.63 x 10 - 5 eV/K)

T use = Use Temperature ( $^{\circ}$ C + 273)

T stress = Life test stress temperature ( ${}^{\circ}C$  + 273)

Both T use and T stress (in degrees Kelvin) need to include the internal temperature rise of the device to represent the junction temperature of the chip under bias.

Failure rates for commercial, industrial and military applications are generally published at 55°C with a 60% CL within the semiconductor industry. Critical system applications sometimes specify a 90% or 95% CL at 55°C or 125°C.

The thermal activation energy (E a ) of a failure mechanism is determined by performing tests at a minimum of two different temperature stress levels. The stresses will provide the time to failure (t f) for the two (or more) populations, thus allowing the simultaneous solution for the activation energy as follows:

$$ln(t_{f1}) = C \div \frac{E_a}{kT_1}$$
 (Eq. 4)

$$ln(t_{f2}) = C + \frac{E_a}{kT_2}$$
 (Eq. 5)

By subtracting the two equations, and solving for the activation energy, the following equation is obtained.

$$E_a = \left(k \times \frac{\ln(t_{f1}) - \ln(t_{f2})}{(\frac{1}{T_1} - \frac{1}{T_2})}\right)$$
 (Eq. 6)

Table 2 below lists several different failure mechanisms, their cause, and the activation energy associated with each. If no failure is recorded for the sample on life test the default activation energy is 1.0 eV, For an unknown failure mechanism an activation energy of 0.7 eV is assumed. Also listed is a possible screen to find the failure mechanism and how to control the problem if it occurs.

**Table 2: FAILURE MECHANISM** 

| Failure<br>Mechanism      | Activation<br>Energy | Screening and Testing<br>Methodology                       | Control Methodology                                                                                  |
|---------------------------|----------------------|------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| Oxide Defects             | 0,3 - 0.5eV          | High Temperature operating life (HTOL) and voltage stress. | Statistical Process Control of oxide parameters, defect density control, and voltage stress testing. |
| Siticon Defects<br>(Bulk) | 0.3 - 0.5eV          | HTOL and voltage stress screens.                           | Vendor statistical Quality Control programs, and Statistical Process Control on thermal processes.   |

# Example

Here is a simple example of how the above equations can be used to calculate the failure rate from life test data. Assume that 600 parts where stressed at  $150\Box C$  ambient for 3000 hours with one failure at 2000 hours for a photoresist flaw (0.7eV) and one failure at 3000 hours for an oxide defect (0.3eV); the internal temperature rise (T j) of the part is  $20\Box C$  and the product was tested at 1000, 2000 and 3000 hours. We want to find the FIT rate for the process with a 95% CL at  $55\Box C$ .

| Failure<br>Mechanism                     | Activation<br>Energy | Screening and Testing<br>Methodology                                                             | Control Methodology                                                                                                                   |
|------------------------------------------|----------------------|--------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| Corresion                                | 0,45ev               | Highly Accelerated Stress Testing (HAST).                                                        | Passivation dopant control, hermetic seal control, improved mold compounds, and product handling.                                     |
| Assembly Defects                         | 0.5 - 0.7eV          | Temperature cycling, tempera-<br>ture and mechanical shock, and<br>environmental stressing.      | Vendor statistical Quality Control programs, Statistical Process Control of assembly processes, and proper handling.                  |
| Electromigration - Al line - Contact/Via | 0.6eV<br>0.9eV       | Test vehicle characterizations at<br>highly elevated temperatures.                               | Design process groundrules to match<br>measured data, statistical control of met-<br>als, photoresist and passivation.                |
| Mask Defects/<br>Photoresist<br>Defects  | 0.7eV                | Mask Fab comparisons, print checks, defect density monitor in Fab, voltage stress test and HTOL. | Clean room control, clean mask, pellicles,<br>Statistical Process Control of photoreaist/<br>etch processes.                          |
| Contamination                            | 1.0eV                | C-V stress of oxides, wafer fab<br>device stress test and HTOL.                                  | Statistical Process Control of C-V data, oxide/interconnect cleans, high integrity glassivation and clean assembly process.           |
| Charge Injection                         | 1.3cV                | HTOL and oxide characteriza-<br>tion.                                                            | Design groundrules based on test results, wafer level Statistical Process Cantrol of gate length and control of gate oxide thickness. |

Table 2: FAILURE MECHANISM

Using the Arrhenius relationship (Eq. 3), the acceleration factors are computed as follows:

$$AF_1 = exp \left[ \frac{0.7}{8.63 \times 10^{-5}} \left( \frac{1}{348} - \frac{1}{443} \right) \right] = 148.2$$
 Photoresist flaw (Eq. 7)

$$AF_2 = exp\left[\frac{0.3}{8.63 \times 10^{-5}} \left(\frac{1}{348} - \frac{1}{443}\right)\right] = 8.52$$
 Oxide defect (Eq. 8)

X 2 = 12.6, which is dependent on the degrees of freedom (2r + 2) = 6 for r = 2 failures and  $\alpha = 95 / 100 = 0.95$  for CL = 95%.

M is then simply X 2 / 2 = 6.3.

The total device hours (TDH) is derived from the summation of the devices on stress multiplied by their test duration.

From Eq. 2 ,  $\boldsymbol{\lambda}$  in FITs is computed as follows:

$$\lambda = (\frac{1}{1.797 \times 10^6 \times 148.2} + \frac{1}{1.797 \times 10^6 \times 8.52}) \times (\frac{6.3 \times 10^9}{2}) = 218 \text{ FTTs (Eq. 9)}$$

Note that for a 60% CL, the X 2 = 6.2, which yields 107 FITs.

The MTTF can be calculated from the reciprocal of the

$$MTTF = (\frac{1}{FITs}) \times 10^9 = 4.59 \times 10^6 \text{ hours @ 95\%CL}$$
 (Eq. 10)

$$MTTF = (\frac{1}{FITs}) \times 10^9 = 9.35 \times 10^6 \text{ hours } @ 60\%CL$$
 (Eq. 11)

# Appendix 2 - Examples

# SYS32256LK-020 MTBF

The SYS 32256LK is a plastic 8Mbit Static RAM (organised as  $256K \times 32$ ) in a 64 SIMM footprint. Bill of Material is as follows:-

- ♦ 8 off 256K4 SRAM in 28 SOJ (e.g. Samsung KM641001AJ)
- ◆ 1 off 6 layer Multi PCB FR4 Laminate (240 Surface Mount soldered connections.)
- 8 off 0.1uF Multilayer Ceramic Chip Capacitors.

# Active Component Reliability

Assumption is that the operating condition will be 55°C ambient with 5.5V supply and confidence level applied will be 95%. The internal temperature rise of the component due to self heating (Tj) is estimated at 20°C.

The calculated reliability depends upon the components used to build the module. For the purposes of this calculation, data for the Samsung KM641001AJ has been used. With reference to the method described in Appendix #1:-

Test Data (Samsung reliability monitor):

1000 pcs stressed at 7V, 125°C, 96 Hrs with 0 failures

387 pcs stressed at 7V, 125°C, 1008 Hrs with 0 failures

Assuming a 0.5eV activation energy, the acceleration factors are calculated as 16.24 (for temperature) and 31.6 (for voltage)

Therefore the total number of device hours is :-

 $(1000x96 + 387x1008) \times 31.6 \times 16.24 = 2.49 \times 10^{8}$  device hours

The  $X_2/2$  value for 0 failures at a 95% confidence level is 2.996.

Therefore the FIT rate is  $2.996/2.49 \times 10^8 = 12$  FIT

The MTBF can be calculated from the reciprocal of the FIT rate multiplied by 109

ACTIVE COMPONENT MTBF = 83x106 hours @ 95%CL

Pcb & Interconnection Reliability

Following the calculation in appendix #2 with the following values:-

- d = 325 mils (28 SOJ 400 mil)
- h = 5 mils
- ◆  $\infty_s = 20$  (FR4 Multilayer)
- ♦ AT = 21 (Ground, fixed)
- $\propto$  CC = 7 (plastic)
- $T_{RISE} = 20 \circ C$
- $\pi_{LC} = 150$  ( J lead)

- ◆ CR = 0.021 cycles/hr (Industrial)
- ♦ Design life = 20 years

Therefore, inserting values and calculating:-

 $N_f = 60,836$  thermal cycles to failure.

 $\infty_{SMT} = 60,836/0.021 = 2,892,000 \text{ hours}$ 

 $LC/\propto_{SMT} = (20x8760)/2,892,000 = 0.061$ 

Therefore from the table ECF = 0.13

 $\lambda_{\text{SMT}} = 0.13/2892000 = 4.5 \times 10^{-8} \text{ failures per hour}$ 

 $\lambda_{\text{SMT}} = 45 \, \text{FIT}$ 

The MTBF can be calculated from the reciprocal of the FIT rate multiplied by 10°  $PCB/INTERCONNECT\,MTBF = 1/45\,x\,10^9 = 22\,x\,10^6\,hours$ 

Passive Device Reliability

The contribution of passive devices (i.e. capacitors) is negligible and can be ignored.

# SYS32256LK-020 Reliability

The total module reliability is given by the sum of all of the FIT rates of contributing elements:-

Active components -8 x 12 FIT's

PCB/Interconnections = 45 FIT's

Therefore SRAM Module = 141 FIT's

 $SYS32256LK-020 MTBF = 7 \times 10^6 \text{ hours (800 years)}$ 

# SYS32128LK-020 MTBF

The SYS 32128LK is a plastic 4Mbit Static RAM (organised as 128K x 32) in a 64 SIMM footprint. Bill of Material is as follows:-

- 4 off 128K8 SRAM in 32 SOJ (e.g. Samsung KM681001AJ)
- ◆ 1 off 6 layer Multi PCB FR4 Laminate (136 Surface Mount soldered connections.)
- ♦ 4 off 0.1uF Multilayer Ceramic Chip Capacitors.

Active Component Reliability

Assumption is that the operating condition will be 55°C ambient with 5.5V supply and confidence level applied will be 95%. The internal temperature rise of the component due to self heating (Tj) is estimated at 20°C.

The calculated reliability depends upon the components used to build the module. For the purposes of this calculation, data for the Samsung KM681001AJ has been used.

The KM681001AJ essentially uses the same die type as the KM641001AJ above ( with just a  $2^{nd}$  layer metal pattern change) Therefore the estimated MTBF should be the same :-

ACTIVE COMPONENT MTBF = 83x106 hours @ 95%CL

Pcb & Interconnection Reliability

The PCB used for this module is similar to the PCB above. The only difference is the use of 32 SOJ instead of 28 SOJ components. As a result the calculated figure is slightly worse because of a higher value for the d parameter.

d = 375 mils (32 SOJ 400 mil)

Working through the calculation in the same way gives :-

 $\lambda_{\rm SMT} = 45 \, \rm FIT$ 

The MTBF can be calculated from the reciprocal of the FIT rate multiplied by  $10^{\circ}$  PCB/INTERCONNECT MTBF =  $16 \times 10^{\circ}$  hours

Passive Device Reliability

The contribution of passive devices (i.e. capacitors) is negligible and can be ignored.

## SYS32128LK-020 Reliability

The total module reliability is given by the sum of all of the FIT rates of contributing elements:-

Active components  $-4 \times 12$  FIT's

PCB/Interconnections = 62 FIT's

Therefore SRAM Module = 110 FIT's

 $SYS32128LK-020 MTBF = 9 \times 106 \text{ hours} (1037 \text{ years})$ 

-